25 research outputs found

    Multiple plasmid origin-of-transfer regions might aid the spread of antimicrobial resistance to human pathogens

    Get PDF
    Antimicrobial resistance poses a great danger to humanity, in part due to the widespread horizontal gene transfer of plasmids via conjugation. Modeling of plasmid transfer is essential to uncovering the fundamentals of resistance transfer and for the development of predictive measures to limit the spread of resistance. However, a major limitation in the current understanding of plasmids is the incomplete characterization of the conjugative DNA transfer mechanisms, which conceals the actual potential for plasmid transfer in nature. Here, we consider that the plasmid-borne origin-of-transfer substrates encode specific DNA structural properties that can facilitate finding these regions in large datasets and develop a DNA structure-based alignment procedure for typing the transfer substrates that outperforms sequence-based approaches. Thousands of putative DNA transfer substrates are identified, showing that plasmid mobility can be twofold higher and span almost twofold more host species than is currently known. Over half of all putative mobile plasmids contain the means for mobilization by conjugation systems belonging to different mobility groups, which can hypothetically link previously confined host ranges across ecological habitats into a robust plasmid transfer network. This hypothetical network is found to facilitate the transfer of antimicrobial resistance from environmental genetic reservoirs to human pathogens, which might be an important driver of the observed rapid resistance development in humans and thus an important point of focus for future prevention measures

    Parallel Factor Analysis Enables Quantification and Identification of Highly Convolved Data-Independent-Acquired Protein Spectra

    Get PDF
    The latest high-throughput mass spectrometry-based technologies can record virtually all molecules from complex biological samples, providing a holistic picture of proteomes in cells and tissues and enabling an evaluation of the overall status of a person\u27s health. However, current best practices are still only scratching the surface of the wealth of available information obtained from the massive proteome datasets, and efficient novel data-driven strategies are needed. Powered by advances in GPU hardware and open-source machine-learning frameworks, we developed a data-driven approach, CANDIA, which disassembles highly complex proteomics data into the elementary molecular signatures of the proteins in biological samples. Our work provides a performant and adaptable solution that complements existing mass spectrometry techniques. As the central mathematical methods are generic, other scientific fields that are dealing with highly convolved datasets will benefit from this work

    Toward learning the principles of plant gene regulation

    Get PDF
    Advanced machine learning (ML) algorithms produce highly accurate models of gene expression, uncovering novel regulatory features in nucleotide sequences involving multiple cis-regulatory regions across whole genes and structural properties. These broaden our understanding of gene regulation and point to new principles to test and adopt in the field of plant science

    DNA structure at the plasmid origin-of-Transfer indicates its potential transfer range

    Get PDF
    Horizontal gene transfer via plasmid conjugation enables antimicrobial resistance (AMR) to spread among bacteria and is a major health concern. The range of potential transfer hosts of a particular conjugative plasmid is characterised by its mobility (MOB) group, which is currently determined based on the amino acid sequence of the plasmid-encoded relaxase. To facilitate prediction of plasmid MOB groups, we have developed a bioinformatic procedure based on analysis of the origin-of-Transfer (oriT), a merely 230 bp long non-coding plasmid DNA region that is the enzymatic substrate for the relaxase. By computationally interpreting conformational and physicochemical properties of the oriT region, which facilitate relaxase-oriT recognition and initiation of nicking, MOB groups can be resolved with over 99% accuracy. We have shown that oriT structural properties are highly conserved and can be used to discriminate among MOB groups more efficiently than the oriT nucleotide sequence. The procedure for prediction of MOB groups and potential transfer range of plasmids was implemented using published data and is available at http://dnatools.eu/MOB/plasmid.HTML

    Learning the Regulatory Code of Gene Expression

    Get PDF
    Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology

    Plastic-Degrading Potential across the Global Microbiome Correlates with Recent Pollution Trends

    Get PDF
    Biodegradation is a plausible route toward sustainable management of the millions of tons of plastic waste that have accumulated in terrestrial and marine environments. However, the global diversity of plastic-degrading enzymes remains poorly understood. Taking advantage of global environmental DNA sampling projects, here we constructed hidden Markov models from experimentally verified enzymes and mined ocean and soil metagenomes to assess the global potential of microorganisms to degrade plastics. By controlling for false positives using gut microbiome data, we compiled a catalogue of over 30,000 nonredundant enzyme homologues with the potential to degrade 10 different plastic types. While differences between the ocean and soil microbiomes likely reflect the base compositions of these environments, we find that ocean enzyme abundance increases with depth as a response to plastic pollution and not merely taxonomic composition. By obtaining further pollution measurements, we observed that the abundance of the uncovered enzymes in both ocean and soil habitats significantly correlates with marine and country-specific plastic pollution trends. Our study thus uncovers the earth microbiome\u27s potential to degrade plastics, providing evidence of a measurable effect of plastic pollution on the global microbial ecology as well as a useful resource for further applied research. IMPORTANCE Utilization of synthetic biology approaches to enhance current plastic degradation processes is of crucial importance, as natural plastic degradation processes are very slow. For instance, the predicted lifetime of a polyethylene terephthalate (PET) bottle under ambient conditions ranges from 16 to 48 years. Moreover, although there is still unexplored diversity in microbial communities, synergistic degradation of plastics by microorganisms holds great potential to revolutionize the management of global plastic waste. To this end, the methods and data on novel plastic-degrading enzymes presented here can help researchers by (i) providing further information about the taxonomic diversity of such enzymes as well as understanding of the mechanisms and steps involved in the biological breakdown of plastics, (ii) pointing toward the areas with increased availability of novel enzymes, and (iii) giving a basis for further application in industrial plastic waste biodegradation. Importantly, our findings provide evidence of a measurable effect of plastic pollution on the global microbial ecology

    Data mining of Saccharomyces cerevisiae mutants engineered for increased tolerance towards inhibitors in lignocellulosic hydrolysates

    Get PDF
    The use of renewable plant biomass, lignocellulose, to produce biofuels and biochemicals using microbial cell factories plays a fundamental role in the future bioeconomy. The development of cell factories capable of efficiently fermenting complex biomass streams will improve the cost-effectiveness of microbial conversion processes. At present, inhibitory compounds found in hydrolysates of lignocellulosic biomass substantially influence the performance of a cell factory and the economic feasibility of lignocellulosic biofuels and chemicals. Here, we present and statistically analyze data on Saccharomyces cerevisiae mutants engineered for altered tolerance towards the most common inhibitors found in lignocellulosic hydrolysates: acetic acid, formic acid, furans, and phenolic compounds. We collected data from 7971 experiments including single overexpression or deletion of 3955 unique genes. The mutants included in the analysis had been shown to display increased or decreased tolerance to individual inhibitors or combinations of inhibitors found in lignocellulosic hydrolysates. Moreover, the data included mutants grown on synthetic hydrolysates, in which inhibitors were added at concentrations that mimicked those of lignocellulosic hydrolysates. Genetic engineering aimed at improving inhibitor or hydrolysate tolerance was shown to alter the specific growth rate or length of the lag phase, cell viability, and vitality, block fermentation, and decrease product yield. Different aspects of strain engineering aimed at improving hydrolysate tolerance, such as choice of strain and experimental set-up are discussed and put in relation to their biological relevance. While successful genetic engineering is often strain and condition dependent, we highlight the conserved role of regulators, transporters, and detoxifying enzymes in inhibitor tolerance. The compiled meta-analysis can guide future engineering attempts and aid the development of more efficient cell factories for the conversion of lignocellulosic biomass

    Performance of regression models as a function of experiment noise

    Get PDF
    A challenge in developing machine learning regression models is that it is difficult to know whether maximal performance has been reached on a particular dataset, or whether further model improvement is possible. In biology this problem is particularly pronounced as sample labels (response variables) are typically obtained through experiments and therefore have experiment noise associated with them. Such label noise puts a fundamental limit to the performance attainable by regression models. We address this challenge by deriving a theoretical upper bound for the coefficient of determination (R2) for regression models. This theoretical upper bound depends only on the noise associated with the response variable in a dataset as well as its variance. The upper bound estimate was validated via Monte Carlo simulations and then used as a tool to bootstrap performance of regression models trained on biological datasets, including protein sequence data, transcriptomic data, and genomic data. Although we study biological datasets in this work, the new upper bound estimates will hold true for regression models from any research field or application area where response variables have associated noise

    Bayesian genome scale modelling identifies thermal determinants of yeast metabolism

    Get PDF
    The molecular basis of how temperature affects cell metabolism has been a long-standing question in biology, where the main obstacles are the lack of high-quality data and methods to associate temperature effects on the function of individual proteins as well as to combine them at a systems level. Here we develop and apply a Bayesian modeling approach to resolve the temperature effects in genome scale metabolic models (GEM). The approach minimizes uncertainties in enzymatic thermal parameters and greatly improves the predictive strength of the GEMs. The resulting temperature constrained yeast GEM uncovers enzymes that limit growth at superoptimal temperatures, and squalene epoxidase (ERG1) is predicted to be the most rate limiting. By replacing this single key enzyme with an ortholog from a thermotolerant yeast strain, we obtain a thermotolerant strain that outgrows the wild type, demonstrating the critical role of sterol metabolism in yeast thermosensitivity. Therefore, apart from identifying thermal determinants of cell metabolism and enabling the design of thermotolerant strains, our Bayesian GEM approach facilitates modelling of complex biological systems in the absence of high-quality data and therefore shows promise for becoming a standard tool for genome scale modeling
    corecore